Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations

نویسندگان

  • Sergio Pulido-Tamayo
  • Aminael Sánchez-Rodríguez
  • Toon Swings
  • Bram Van den Bergh
  • Akanksha Dubey
  • Hans Steenackers
  • Jan Michiels
  • Jan Fostier
  • Kathleen Marchal
چکیده

Clonal populations accumulate mutations over time, resulting in different haplotypes. Deep sequencing of such a population in principle provides information to reconstruct these haplotypes and the frequency at which the haplotypes occur. However, this reconstruction is technically not trivial, especially not in clonal systems with a relatively low mutation frequency. The low number of segregating sites in those systems adds ambiguity to the haplotype phasing and thus obviates the reconstruction of genome-wide haplotypes based on sequence overlap information.Therefore, we present EVORhA, a haplotype reconstruction method that complements phasing information in the non-empty read overlap with the frequency estimations of inferred local haplotypes. As was shown with simulated data, as soon as read lengths and/or mutation rates become restrictive for state-of-the-art methods, the use of this additional frequency information allows EVORhA to still reliably reconstruct genome-wide haplotypes. On real data, we show the applicability of the method in reconstructing the population composition of evolved bacterial populations and in decomposing mixed bacterial infections from clinical samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultra-deep sequencing for the analysis of viral populations.

Next-generation sequencing allows for cost-effective probing of virus populations at an unprecedented level of detail. The massively parallel sequencing approach can detect low-frequency mutations and it provides a snapshot of the entire virus population. However, analyzing ultra-deep sequencing data obtained from diverse virus populations is challenging because of PCR and sequencing errors and...

متن کامل

Benchmarking of viral haplotype reconstruction programmes: an overview of the capacities and limitations of currently available programmes

Viral haplotype reconstruction from a set of observed reads is one of the most challenging problems in bioinformatics today. Next-generation sequencing technologies enable us to detect single-nucleotide polymorphisms (SNPs) of haplotypes-even if the haplotypes appear at low frequencies. However, there are two major problems. First, we need to distinguish real SNPs from sequencing errors. Second...

متن کامل

Benchmarking of Viral Haplotype Reconstruction Programmes: An overview of the capacities and limitations of the currently available programmes

Viral haplotype reconstruction from a set of observed reads is one of the most challenging problems in bioinformatics today. Next-generation sequencing (NGS) technologies enable us to detect single nucleotide polymorphisms (SNPs) of haplotypes even if the haplotypes appear at low frequencies. However, there are two major problems. First, we need to distinguish real SNPs from sequencing errors. ...

متن کامل

Population structure of sea cucumber Holothuria parva by 16S rRNA mitochondrial in the costs of Bushehrand Halileh from Persian Gulf

Population structure of sea cucumber Holothuria parva in the coasts of Bushehr and Halileh from Persian Gulf was determined by 16S rRNA of mitochondrial genome sequencing in autumn and winter seasons of 2019. In Bushehr and Halileh populations, 2 and 4 haplotypes were identified out of 374 nucleotide sites, respectively, and haplotype 2 was the most abundant in Bushehr population and was observ...

متن کامل

Next-Generation Sequencing of HIV-1 RNA Genomes: Determination of Error Rates and Minimizing Artificial Recombination

Next-generation sequencing (NGS) is a valuable tool for the detection and quantification of HIV-1 variants in vivo. However, these technologies require detailed characterization and control of artificially induced errors to be applicable for accurate haplotype reconstruction. To investigate the occurrence of substitutions, insertions, and deletions at the individual steps of RT-PCR and NGS, 454...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2015